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Abstract. Formal behavioral specifications written early in the system-design 
process and communicated across all design phases have been shown to increase 
the efficiency, consistency, and quality of the system under development. To pre- 
vent introducing design or verification errors, it is crucial to test specifications 
for satisfiability. Our focus here is on specifications expressed in linear temporal 
logic (LTL). 

We introduce a novel encoding of symbolic transition-based Btichi automata and 
a novel, “sloppy,” transition encoding, both of which result in improved scalabil- 
ity. We also define novel BDD variable orders based on tree decomposition of 
formula parse trees. We describe and extensively test a new multi-encoding ap- 
proach utilizing these novel encoding techniques to create 30 encoding variations. 
We show that our novel encodings translate to significant, sometimes exponential, 
improvement over the current standard encoding for symbolic LTL satisfiability 
checking. 


1 Introduction 

In property-based design formal properties, written in temporal logics such as LTL [31], 
are written early in the system-design process and communicated across all design 
phases to increase the efficiency, consistency, and quality of the system under develop- 
ment [34, 36]. Property-based design and other design-for- verification techniques cap- 
ture design intent precisely, and use formal logic properties both to guide the design 
process and to integrate verification into the design process [24]. The shift to specifying 
desired system behavior in terms of formal logic properties risks introducing specifi- 
cation errors in this very initial phase of system design, raising the need for property 
assurance [30,34], 

The need for checking for errors in formal LTL properties expressing desired sys- 
tem behavior first arose in the context of model checking, where vacuity checking aims 
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at reducing the likelihood that a property that is satisfied by the model under verifi- 
cation is an erroneous property [2,27], Property assurance is more challenging at the 
initial phases of property-based design, before a model of the implementation has been 
specified. Inherent vacuity checking is a set of sanity checks that can be applied to a 
set of temporal properties, even before a model of the system has been developed, but 
many possible errors cannot be detected by inherent vacuity checking [19], 

A stronger sanity check for a set of temporal properties is LTL realizability check- 
ing, in which we test whether there is an open system that satisfies all the properties 
in the set [32], but such a test is very expensive computationally. In LTL satisfiability 
checking, we test whether there is a closed system that satisfies all the properties in 
the set. The satisfiability test is weaker than the realizability test, but its complexity is 
lower; it has the same complexity as LTL model checking [39]. In fact, LTL satisfiability 
checking can be implemented via LTL model checking; see below. 

Indeed, the need for LTL satisfiability checking is widely recognized [14,23,25, 
28, 35]. Foremost, it serves to ensure that the behavioral description of a system is in- 
ternally consistent and neither over- or under-constrained. If an LTL property is either 
valid, or unsatisfiable this must be due to an error. Consider, for example, the speci- 
fication always{b\ — > eventually bfi), where b\ and bn are propositional formulas. If 
ft 2 is a tautology, then this property is valid. If bn is a contradiction, then this prop- 
erty is unsatisfiable. Furthermore, the collective set of properties describing a system 
must be satisfiable, to avoid contradictions between different requirements. Satisfiabil- 
ity checking is particularly important when the set of properties describing the design 
intent continues to evolve, as properties are added and refined, and have to be checked 
repeatedly. Because of the need to consider large sets of properties, it is critical that the 
satisfiability test be scalable, and able to handle complex temporal properties. This is 
challenging, as LTL satisfiability is known to be PSPACE-complete [39]. 

As pointed out in [35], satisfiability checking can be performed via model check- 
ing: a universal model (that is, a model that allows all possible traces) does not satisfy 
a linear temporal property -i / precisely when / is satisfiable. In [35] we explored the 
effectiveness of model checkers as LTL satisfiability checkers. We compared there the 
performance of explicit-state and symbolic model checkers. Both use the automata- 
theoretic approach [43] but in a different way. Explicit-state model checkers translate 
LTL formulas to Biichi automata explicitly and then use an explicit graph-search algo- 
rithm [11]. For satisfiability checking, the construction of the automaton is the more 
demanding task. Symbolic model checkers construct symbolic encodings of automata 
and then use a symbolic nonemptiness test. The symbolic construction of the automaton 
is easy, but the nonemptiness test is computationally demanding. The extensive set of 
experiments described in [35] showed that the symbolic approach to LTL satisfiability 
is significantly superior to the explicit-state approach in terms of scalability. 

In the context of explicit-state model checking, there has been extensive research 
on optimized construction of automata from LTL formulas [12, 13,20-22,38,40,41], 
where a typical goal is to minimize the size of constructed automata [42], Optimizing 
the construction of symbolic automata is more difficult, as the size of the symbolic rep- 
resentation does not correspond directly to its optimality. An initial symbolic encoding 
of automata was proposed in [6], but the optimized encoding we call CGH, proposed 
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by Clarke, Gmmberg, and Hamaguchi [10], has become the de facto standard encoding. 
CGH encoding is used by model checkers such as CadenceSMV and NuSMV, and has 
been extended to symbolic encodings of industrial specification languages [9], Surpris- 
ingly, there has been little follow-up research on this topic. 

In this paper, we propose novel symbolic LTL-to-automata translations and utilize 
them in a new multi-encoding approach to achieve significant, sometimes exponential, 
improvement over the current standard encoding for LTL satisfiability checking. First 
we introduce and prove the correctness of a novel encoding of symbolic automata in- 
spired by optimized constructions of explicit automata [12,22]. While the CGH encod- 
ing uses Generalized Biichi Automata (GBA), our new encoding is based on Transition- 
Based Biichi Automata (TGBA). Second, inspired by work on symbolic satisfiability 
checking for modal logic [29], we introduce here a novel sloppy encoding of symbolic 
automata, as opposed to the fussy encoding used in CGH. Sloppy encoding uses looser 
constraints, which sometimes results in smaller BDDs. The sloppy approach can be ap- 
plied both to GBA-based and TGBA-based encodings, provided that one uses negation- 
normal form (NNF), [40], rather than the Boolean normal form (BNF) used in CGH. 
Finally, we introduce several new variable-ordering schemes, based on tree decompo- 
sition of the LTL parse tree, inspired by observations that relate tree decompositions to 
BDD variable ordering [17]. The combination of GBA/TGBA, fussy/sloppy, BNF/NNF, 
and different variable orders yields a space of 30 possible configurations of symbolic 
automata encodings. (Not all combinations yield viable configurations.) 

Since the value of novel encoding techniques lies in increased scalability, we evalu- 
ate our novel encodings in the context of LTL satisfiability checking, utilizing a compre- 
hensive and challenging collection of widely-used benchmark formulas [7, 14, 23, 35]. 
For each formula, we perform satisfiability checking using all 30 encodings. (We use 
CadenceSMV as our experimental platform.) Our results demonstrate conclusively that 
no encoding performs best across our large benchmark suite. Furthermore, no single 
approach-GBA vs. TGBA, fussy vs. sloppy, BNF vs. NNF, or any one variable order, 
is dominant. This is consistent with the observation made by others [1,42], that in the 
context of symbolic techniques one typically does not find a “winning” algorithmic con- 
figuration. In response, we developed a multi-encoding tool, PANDA, which runs sev- 
eral encodings in parallel, terminating when the first process returns. Our experiments 
demonstrate conclusively that the multi-encoding approach using the novel encodings 
invented in this paper achieves substantial improvement over CGH, the current standard 
encoding; in fact PANDA significantly bested the native LTL model checker built into 
CadenceSMV. 

The structure of this paper is as follows. We review the CGH encoding [10] in 
Section 2. Next, in Section 3, we describe our novel symbolic TGBA encoding. We 
introduce our novel sloppy encoding and our new methods for choosing BDD variable 
orderings and discuss our space of symbolic encoding techniques in Section 4. After 
setting up our scalability experiment in Section 5, we present our test results in Section 
6, followed by a discussion in Section 7. Though our construction can be used with 
different symbolic model checking tools, in this paper, we follow the convention of [10] 
and give examples of all constructions using the SMV syntax. 
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2 Preliminaries 


We assume familiarity with LTL [16]; For convenience. Appendix A defines LTL se- 
mantics. We use two normal forms: 

Definition 1 Boolean Normal Form (BNF) rewrites the input formula to use only 
V, X, II, and T . In other words, we replace A, — H, and Q with their equivalents: 
gi A 82 = — ■(— '^t V ->g2) t ?i *Rg2 = ~ , (~'g l ~^gi) 

g\ g 2 = ~>gi V g Qg\ = -i T~>g\ 


Definition 2 Negation Normal Form (NNF) pushes negation inwards until only atomic 
propositions are negated, using the following rules: -i(Xg) = X(~ig) 

-'-'g = g -'(gfdgi) = (-'gi'R-'gi) 


~'(g 1 A gf) = tegl) V (->g2) 
“■tel V gf) = (“tgl) A (->£ 2 ) 
fel gl) = tegl) V g2 


-'(giftgi) = (—'gi'H—'gi) 

-'(eg) = Ti^g) 
<Tg) = &(^g) 


In automata-theoretic model checking, we represent LTL formulas with Biichi automata. 


Definition 3 A Generalized Biichi Automaton (GBA) is a quintuple (Q,E,6, Qo,F), 
where: •dcQxXxQisa transition relation. 

• Q is a finite set of states. • Qo £ Q is a set of initial states. 

• E is a finite alphabet. • F c 2^ is a set of accepting state sets. 

A run of a Biichi automaton A over an infinite trace n = ttq, n\,Ti 2 , ... & E is a sequence 
qo,q\,qi, ■ ■ ■ of states such that qo 6 Qo, and (qj,7ti,qi+ 1 ) e 6 for all i > 0. A accepts 
n if the run over n visits states in every set in F infinitely often. We denote the set of 
infinite traces accepted by A by £ W (A). 

A trace satisfying LTL formula / is an infinite run over the alphabet E = 2 Prop , where 
Prop is the underlying set of atomic propositions. We denote by models(f) the set of 
traces satisfying /. The next theorem relates the expressive power of LTL to that of 
Biichi automata. 

Theorem 1 [44] Given an LTL formula f, we can construct a generalized Biichi au- 
tomaton Ay — (Q, E, 6, Qo, F) such that \Q\ is in 2 0< ^\ E — 2 Prop , and X, i0 (A f) is exactly 
model s(f). 

This theorem reduces LTL satisfiability checking to automata-theoretic nonemptiness 
checking, as / is satisfiable iff models(f) 4 0 iff -Caj(Af) + 0. 

LTL satisfiability checking relates to LTL model checking as follows. We use a 
universal model M that generates all traces over Prop such that TffiM) = (2 Pr °py j . 
The code for this model appears in [35] and Appendix B. We now have that M does not 
satisfy -if iff / is satisfiable. We use a symbolic model checker to check the formula -if 
against M; / is satisfiable precisely when the model checker finds a counterexample. 
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CGH encoding In this paper we focus on LTL to symbolic Biichi automata compilation. 
We recap the CGH encoding [10], which assumes that the formula / is in BNF, and then 
forms a symbolic GBA. We first define the CGH-closure of an LTL formula / as the set 
of all subformulas of / (including / itself), where we also add the formula X(g 'Ll h) 
for each subformula of the form g 'U h. The A’-formulas in the CGH-closure of / are 
called elementary formulas. 

We declare a Boolean SMV variable EL% g for each elementary formula Xg in the 
CGH-closure of /. Also, each atomic proposition in / is declared as a Boolean SMV 
variable. We define an auxiliary variable S /, for every formula h in the CGH-closure 
of /. (Auxiliary variables are substituted away by SMV and do not required allocated 
BDD variables.) The characteristic function for an auxiliary variable Si, is defined as 
follows: 

S i, = p if peAP S h =\S g if h = ->g S h =S g i\S g2 if/i=giVg 2 

S h = EL h if h is a formula Xg S h = S ^1(5 gi &S x, s , „ f2) ) if h = gi 1/ g 2 

We now generate the SMV model Mf. 

MODULE main 
VAR 

a: boolean; /*declare a Boolean var for each atomic prop in f */ 

EL_Xg: boolean; /*declare a Boolean var for every formula Xg in the CGH-closure*/ 

DEFINE /*auxiliary vars according to characteristic function */ 

S_h := ... 

TRANS /*for every formula Xg in the CGH-closure, add a transition constraint*/ 

(S_Xg = next(S_g)) 

FAIRNESS ! S_gUh | S_h /*for each subformula gUh */ 

FAIRNESS TRUE /*or a generic fairness condition otherwise*/ 

SPEC ! (S_f & EG true) /*end with a SPEC statement*/ 

The traces of Mf correspond to the accepting runs of Af, starting from arbitrary states. 
Thus, satisfiability of / corresponds to nonemptiness of Mf, starting from an initial 
state. We can model check such nonemptiness with SPEC ! (S_f & EG true) . A coun- 
terexample is an infinite trace starting at a state where S f holds. Thus, the model checker 
returns a counterexample that is a trace satisfying /. 

Remark 1 While the syntax we use is shared by CadenceSMV and NuSMV, the precise 
semantics ofCTL model checking in these model checkers is not fully documented and 
there are some subtle but significant differences between the two tools. Therefore, we 
use CadenceSMV semantics here and describe these subtleties in Appendix C. 


3 A Symbolic Transition-Based Generalized Biichi Automata 
(TGBA) Encoding 

We now introduce a novel symbolic encoding, referred to as TGBA, inspired by the 
explicit-state transition-based Generalized Biichi automata of [22]. Such automata are 
used by SPOT [15], which was shown experimentally [35] to be the best explicit LTL 
translator for satisfiability checking. 

Definition 4 A Transition-Based Generalized Biichi Automaton (TGBA) is a quin- 
tuple (Q,X, 6, Qo, F ), where: 
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• dQQxExQ is a transition relation. 

• Q is a finite set of states. • Qo £ Q is a set of initial states. 

• E is a finite alphabet. • F c 2 s is a set of accepting transitions. 


A run of a TGBA over an infinite trace n = 7Tq, n\,n 2 , ... £ E is a sequence (qo, no, q\), 
(q\,n\,q 2 ), (l 2 , tt 2 , q 3 ), ■ ■ ■ of transitions in 6 such that qo e Qo- The automaton accepts 
n if it has a run over n that traverses some transition from each set in F infinitely often. 

The next theorem relates the expressive power of LTL to that of TGBAs. 

Theorem 2 [12,22] Given an LTL formula f, we can construct a TGBA A f = ( Q,E , 6, 
Qo,F) such that \Q\ is in 2 0( ^\ E — 2 Prop , and JLoiiAf) is exactly models(f). 

Expressing acceptance conditions in terms of transitions rather than states enables a 
significant reduction in the size of the automata corresponding to LTL formulas [12,22]. 

Our new encoding of symbolic automata, based on TGBAs, assumes that the input 
formula / is in NNF. (This is due to the way that the satisfaction of //-formulas is 
handled by means of promise variables; see below.) As in CGH, we first define the 
closure of an LTL formula /. In the case of TGBAs, however, we simply define the 
closure to be the set of all subformulas of / (including / itself). Note that, unlike in the 
CGH encoding, 1L- and T- formulas do not require the introduction of new A-formulas. 

The set of elementary formulas now contains: /; all 1L- , K-, T-, Q-, and QT - 
subformulas in the closure of /, as well as all subformulas g where Xg is in the closure 
of /. Note that we treat the common QT combination as a single operator. 

Again, we declare a Boolean SMV variable EL g for every elementary formula g 
as well as Boolean variables for each atomic proposition in /. In addition, we declare 
a Boolean SMV promise variable P g for every 'Ll-, T-, and 0F-subformula in the 
closure. These formulas are used to define fairness conditions. Intuitively, P g holds 
when g is a promise for the future that is not yet fulfilled. If If does not hold, then the 
promise must be fulfilled immediately. To ensure satisfaction of eventualities we require 
that each promise variable P g is false infinitely often. The TGBA encoding creates fewer 
EL variables than the CGH encoding, but it does add promise variables. 

Again, we define an auxiliary variable S /, for every formula h in the closure of /.The 
characteristic function for S /, is defined as in the CGH encoding, with the following 
changes: 5/, = S gl &S g2 if h = gi A g 2 

S u — next(EL g ) if h = Xg 

Sh = S g2 \(S g \&.P g \ <u g2 &(next(ELg\ , u g2 ))) if h = g i'll g 2 
S h = S g2 &(S g i\(next(EL gl Kg2 ))) if h = gi l Rg 2 
Sh = S g &(next(ELg g )) if h - Q g 
Si, — S g \(Pf g &next(ELjr g )) if h = T g 
Sh = ( next(EL &r g ))&(S g \Pg r g ) if h = QT g 
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Since we reason directly over the temporal subformulas of / (and not over Xg for 
temporal subformula g as in CGH), the transition relation associates elementary for- 
mulas with matching elements of our characteristic function. Finally, we generate our 
symbolic TGBA; here is our SMV model Mf. 

MODULE main 

VAR /*declare a boolean variable for each atomic proposition in £*/ 
a : boolean; 

VAR /*declare a new variable for each elementary formula*/ 

EL_f : boolean; /*£ is the input LTL formula*/ 

EL_gl : boolean; /*g is an X-, F-, U-, or GF-formula*/ 

DEFINE /*characteristic function definition*/ 

s_g = ... 

TRANS /*for each EL-var, generate a line here*/ 

C EL_gl = S_gl ) & /*a line for every EL variable*/ 

FAIRNESS (!P_gl) /*£ airness constraint for each promise variable*/ 

FAIRNESS TRUE /*only needed if there are no promise variables*/ 

SPEC ! (EL_f & EG TRUE) 

Symbolic TGBAs can only be created for NNF formulas because the model checker 
tries to guess a sequence of values for each of the promise variables to satisfy the subfor- 
mulas, which does not work for negative '//-formulas. (This is also the case for explicit 
state model checking; SPOT also requires NNF for TGBA encoding [12].) Consider the 
formula / = -> (a 7/ b) and the trace a=l , b=® , a=l , b=l , ... Clearly, ( a 7/ h) holds 
in the trace, so / fails in the trace. If, however, we chose P_aUb to be false at time 0, 
then EL_aUb is false at time 0, which means that / holds at time 0. The correctness of 
our construction is summarized by the following theorem. 

Theorem 3 Let Mf be the SMV program made by the TGBA encoding for LTL formula 
f. Then Mf does not satisfy the specification ! fEL_f & EG true) iff f is satisfiable. 

The proof of this theorem appears in Appendix D. 


4 A Set of 30 Symbolic Automata Encodings 

Our novel encodings are combinations of four components: (1) Normal Form: BNF or 
NNF, described above, (2) Automaton Form: GBA or TGBA, described above, (3) Tran- 
sition Form: fussy or sloppy, described below, and (4) Variable Order: default, naive, 
LEXP, LEXM, MCS-MIN, MCS-MAX, described below. In total, we have 30 novel encodings, 
since BNF can only be used with fussy-encoded GBAs, as explained below. CGH cor- 
responds to BNF/fussy/GBA; we encode this combination with all six variable orders. 

Automaton Form As discussed earlier, CGH is based on GBA, in combination with 
BNF. We can combine, however, GBA also with NNF. For this, we need to expand the 
characteristic function for symbolic GBAs in order to form them from NNF formulas: 
Sh = S g i&S g 2 if h = gi A g 2 s h = S g &S X{gg) if h = Qg 

S h = S g2 &(S gl \S ,x (gl K gl) ) if h = g l r Rg 1 s h = S g \S x<rg) if/i = Tg 



Kristin Y. Rozier and Moshe Y. Vardi 


Since our focus here is on symbolic encoding, PANDA, unlike CadenceSMV, does 
not apply formula rewriting and related optimizations; rather, PANDA’s symbolic au- 
tomata are created directly from the given normal form of the formula. Formula rewrit- 
ing may lead to further improvement in PANDA’s performance. 

Sloppy Encoding: A Novel Transition Form CGH employs iff-transitions, of the form 
TRANS (EL_g= (S_g) ) . We refer to this as fussy encoding. For formulas in NNF, we can 
use only-if transitions of the form TRANS (EL_g->(S_g)), which we refer to as sloppy 
encoding. A similar idea was shown to be useful in the context of modal satisfiability 
solving [29], Sloppy encoding increases the level of non-determinism, yielding a looser, 
less constrained encoding of symbolic automata, which in many cases results in smaller 
BDDs. A side-by-side example of the differences between GBA and TGBA encodings 
(demonstrating the sloppy transition form) for formula / = ((Xa)&(b IT (la))) is given 
in Figures 1-2. 


MODULE main 

/* formula: ((X (a )) & ((b )U (!(a ))))*/ 

VAR /*a Boolean var for each prop in £*/ 
a : boolean; 
b : boolean; 

VAR /*a var EL_X_g for each formula (X g) in 
el_list w/primary op X, U, R, G, or F*/ 
EL_X_a : boolean; 

EL_X b_U_NOT_a : boolean; 

DEFINE 

/*each S_h in the characteristic function*/ 

S X_a AND b_U_NOT_a : = 

(EL_X_a) & (S b_U_NOT_a) ; 

S b_U_NOT_a := 

0(a)) | (b & EL_X b_U_NOT_a) ; 

TRANS /*a line for each (X g) in el_list*/ 

( EL_X_a -> (next (a) ) ) & 

( EL_X b_U_NOT_a -> (next(S__b_U_NOT_a) )) 

FAIRNESS ( ! S__b_U_NOT_a | (!(a ))) 

SPEC ! (S X a AND b U NOT a & EG TRUE) 


MODULE main 

/* formula: ((X (a ))& ((b )U (!(a ))))*/ 

VAR /*a Boolean var for each prop in £*/ 
a : boolean; 
b : boolean; 

VAR /*a var for each EL_var in el_list*/ 

EL X_a AND b_U_NOT_a : boolean; 

P b_U_NOT_a: boolean; 

EL b_U_NOT_a : boolean; 

DEFINE 

/*each S_h in the characteristic function*/ 

S X_a AND b_U_NOT_a : = 

(S_X_a) & (EL b_U_NOT_a) ; 

S_X_a := (next (a)); 

S b_U_NOT_a := ( ((!(a ))) 

| (b& P b_U_NOT_a & (next(EL__b_U_NOT_a)))) ; 

TRANS /*a line for each EL_var in el_list*/ 

( EL X_a AND b_U_NOT_a -> 

(S X a AND b U NOT a) ) & 

( EL b_U_NOT_a -> (S__b_U_NOT_a) ) 

FAIRNESS ( ! P__b_U_NOT_a) 

SPEC ! (EL X_a AND b_U_NOT_a & EG TRUE) 


Fig. 1. NNF/sloppy/GBA encoding for CadenceSMV Fig. 2. NNF/sloppy/TGBA encoding for CadenceSMV 


A New Way of Choosing BDD Variable Orders Symbolic model checkers search for 
a fair trace in the model-automaton product using a BDD-based fixpoint algorithm, a 
process whose efficacy is highly sensitive to variable order [5]. Finding an optimal BDD 
variable order is NP-hard, and good heuristics for variable ordering are crucial. 

Recall that we define state variables in the symbolic model for only certain subfor- 
mulas: p 6 AP, EL_g, and P_g for some subformulas g. We form the variable graph by 
identifying nodes in the input-formula parse tree that correspond to the primary opera- 
tors of those subformulas. Since we declare different variables for the GBA and TGBA 
encodings, the variable graph for a formula / may vary depending on the automaton 
form we choose. Figure 3 displays the GBA and TGBA variable graphs for an example 
formula, overlaid on the parse tree for this formula. We connect each variable-labeled 
vertex to its closest variable-labeled vertex descendant) s), skipping over vertices in the 
parse tree that do not correspond to state variables in our automaton construction. We 
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(a) GBA variable graph 


(b) TGBA variable graph 


Fig. 3. Graphs in (a) and (b) were both formed from the parse tree for / = ((Xa) A (b 'll -> a)). 

create one node per subformula variable, irrespective of the number of occurrences of 
the subformula; for example, we create only one node for the proposition a in Figure 3. 

We implement five variable ordering schemes, all of which take the variable graph 
as input. We compare these to the default heuristic of CadenceSMV. The naive variable 
order is formed directly from a pre-order, depth-first trave rsal of the variable graph. We 
derive four additional variable-ordering heuristics by repurposing node-ordering algo- 
rithms designed for graph triangulation [26], 3 We use two variants of a lexicographic 
breadth-first search algorithm: variants perfect (LEXP) and minimal (LEXM). LEXP labels 
each vertex in the variable graph with its already-ordered neighbors; the unordered 
vertex with the lexicographic largest label is selected next in the variable order. LEXM 
operates similarly, but labels unordered vertices with both their neighbors and also all 
vertices that can be reached by a path of unordered vertices with smaller labels. The 
maximum-cardinality search (MCS) variable ordering scheme differs in the vertex selec- 
tion criterion, selecting the vertex in the variable graph adjacent to the highest number 
of already ordered vertices next. We seed MCS with an initial vertex, chosen either to 
have the maximum (MCS-MAX) or minimum (MCS-MIN) degree. 

5 Experimental Methodology 

Test Methods Each test was performed in two steps. First, we applied our symbolic 
encodings to the input formula. Second, each symbolic automaton and variable order 
file pair was checked by CadenceSMV. Since encoding time is minimal and heavily 
dominated by model-analysis time (the time to check the model for nonemptiness to 
determine LTL satisfiability) we focus exclusively on the latter here. 

Platform We ran all tests on Shared University Grid at Rice (SUG@R), an Intel Xeon 
compute cluster. 4 SUG@R is comprised of 134 SunFire x4150 nodes, each with two 
quad-core Intel Xeon processors running at 2.83GHz and 16GB of RAM per processor. 
The OS is Red Hat Enterprise 5 Linux, 2.6.18 kernel. Each test was run with exclusive 
access to one node. Times were measured using the Unix time command. 

Input Formulas We employed a widely-used [7, 14, 23, 35] collection of benchmark 
formulas, established by [35]. All encodings were tested using three types of scalable 
formulas: random, counter, and pattern. Definitions of these formulas are repeated for 
convenience in Appendix B. Our test set includes 4 counter and 9 pattern formula varia- 
tions, each of w hich scales to a large number of variables, and 60,000 random formulas. 

3 Graph triangulation implementation coded by the Kavraki Lab at Rice University. 

4 http://rcsg.rice.edu/sugar/ 
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Correctness In addition to proving the correctness of our algorithm, the correctness 
of our implementation was established by comparing for every formula in our large 
benchmark suite, the results (either SAT or UNSAT) returned by all encodings studied 
here, as well as the results returned by CadenceSMV for checking the same formula as 
an LTL specification for the universal model. We never encountered an inconsistency. 


6 Experimental Results 

Our experiments demonstrate that the novel encoding methods we have introduced sig- 
nificantly improve the translation of LTL formulas to symbolic automata, as measured 
in time to check the resulting automata for nonemptiness and the size of the state space 
we can check. No single encoding, however, consistently dominates for all types of for- 
mulas. Instead, we find that different encodings are better suited to different formulas. 
Therefore, we recommend using a multi-encoding approach, a variant of the multi- 
engine approach [33], of running all encodings in parallel and terminating when the 
first job completes. We call our tool PANDA for “Portfolio Approach to Navigate the 
Design of Automata.” 

Seven configurations are not competitive While we can not predict the best encodings, 
we can reliably predict the worst. The following encodings were never optimal for any 
formulas in our test set. Thus, out of our 30 possible encodings, we rule out these seven: 

- BNF/fussy/GBA/LEXM (essentially CGH with LEXM) 

- NNF/fussy/GBA/LEXM - NNF/fussy/TGBA/MCS-MAX 

- NNF/fussy/TGBA/LEXM - NNF/sloppy/TGBA/MCS-MAX 

- NNF/sloppy/GBA/LEXM - NNF/sloppy/TGBA/MCS-MIN 

NNF is the best normal form, most (but not all) of the time. NNF encodings were 
always better for all counter and pattern formulas; see, for example. Figure 4. Figure 5 
demonstrates the use of both normal forms in the optimal encodings chosen by PANDA 
for random formulas. BNF encodings were occasionally significantly better than NNF; 
the solid point in Figure 5 corresponds to a formula for which the best BNF encoding 
was more than four times faster than the best NNF encoding. NNF was best much more 
often than BNF, likely because using NNF has the added benefit that it allows us to 
employ our sloppy encoding and TGBAs, which often carry their own performance 
advantages. 

No automaton form is best. Our TGBA encodings dominated for If, .S', and U pattern 
formulas and both types of 3-variable counter formulas. For instance, the log-scale plot 
in Figure 6 shows that PANDA’s median model analysis time for If pattern formulas 
grows subexponentially as a function of the number of variables, while CadenceSMV’s 
median model analysis time for the same formulas grows exponentially. (The best of 
PANDA’s GBA encodings is also graphed for comparison.) GBA encodings are better 
for other pattern formulas, both types of 2-variable counter formulas, and the majority 
of random formulas; Figure 7 demonstrates this trend for 180 length random formulas. 
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Fig. 4. Median model analysis time for Fig. 5. Best encodings of 500 3-variable, 160 
R(n) = /\" = , (QT Pi V TQp i+ \ ) for PANDA length random formulas. Points fall below the 
NNF/sloppy/GBA/naive, CadenceSMV, and diagonal when NNF is better, 
the best BNF encoding. 



PANDA’s NNF/sloppy/TGBA/LEXP encoding length random formulas, 
scales better than the best GBA encod- 
ing, NNF/sloppy/GBA/naive, and exponen- 
tially better than CadenceSMV. 

No transition form is best Sloppy is the best transition form for all pattern formulas. For 
instance, the log-scale plot of Figure 8 illustrates that PANDA’s median model analysis 
time for U pattern formulas grows subexponentially as a function of the number of vari- 
ables, while CadenceSMV’s median model analysis time for the same formulas grows 
exponentially. Fussy encoding is better for all counter formulas. The best encodings of 
random formulas were split between fussy and sloppy. Figure 9 demonstrates this trend 
for 140 length random formulas. 

No variable order is best, but LEXM is worst. The best encodings for our benchmark 
formula set were split between five variable orders. The naive and default orders proved 
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U Pattern Formulas 



Best fussy encoding vs best sloppy encoding: 
3-variable, 140 length random formulas 



Fussy Encodings Model Analysis Times (sec) 


Fig. 8. U(n) = (. U pi) U . ..) U p„.Fig. 9. Best encodings of 500 3-variable, 140 
PANDA’s NNF/sloppy/TGBA/LEXP scalables length random formulas. Points fall below the 
exponentially better than CadenceSMV. diagonal when sloppy encoding is best. 


optimal for more random formulas than the other orders. Figure 10 demonstrates that 
neither the naive order nor the default order is better than the other for random formulas. 
The naive order was optimal for E, Q, R, U 2 , and S patterns. MCS-MAX is optimal for 2- 
and 3-variable linear counters. The LEXP variable order dominated for C 1 , C 2 , U , and 
Ri pattern formulas, as well as for 2- and 3-variable counter formulas, yet it was rarely 
best for random formulas. Figure 1 1 demonstrates the marked difference in scalability 
provided by using the LEXP order over running CadenceSMV on 3-variable counter 
formulas. We can analyze much larger models with PANDA using LEXP than with the 
native CadenceSMV encoding before memory-out. We never found the LEXM order to 
be the single best encoding for any formula. 


Best encodings with naive vs default variable orders 
3-variable, 195 length random formulas 



Naive Encodings Model Analysis Times (sec) 


TJ 
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3-variable Counter Formulas 



Fig. 10. Best encodings of 500 3-variable, 195 Fig. 11. Maximum states analyzed before 
length random formulas. Points fall above the space-out. CadenceSMV quits at 10240 states, 
diagonal when naive variable order is best. PANDA’s NNF/fussy/TGBA/LEXP scales to 

491520 states. 
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A formula class typically has a best encoding , but predictions are difficult While each 
of our pattern and counter formulas had a best (or a pair of best) encodings, which 
remained consistent as we scaled the formulas, we found that we could not reliably 
predict the best encoding using any statistics gathered from parsing, such as operator 
counts or ratios. For example, we found that the best encoding for a pattern formula 
was not necessarily the best for a randomly-generated formula comprised of the same 
temporal operators. We surmise that the best encoding is tied to the structure of the 
formula on a deeper level; developing an accurate heuristic is left to future work. 

There is no single best encoding; a multi-encoding approach is clearly superior We 
implement a novel multi-encoding approach: our new PANDA tool creates several en- 
codings of a formula and uses a symbolic model checker to check them for satisfiability 
in parallel, terminating when the first check completes. Our experimental data supports 
this multi-encoding approach. Figures 4, 6, and 8 highlight the significant decrease in 
CadenceSMV model analysis time for R, If, and U pattern formulas, while Figure 1 1 
demonstrates increased scalability in terms of state space using counter formulas. Al- 
together, we demonstrate that a multi-encoding approach is dramatically more scalable 
than the current state-of-the-art. The increase in scalability is dependant on the spe- 
cific formula, though for some formulas PANDA’s model analysis time is exponentially 
better than CadenceSMV’s model analysis time for the same class of formulas. 


7 Discussion 

This paper brought attention to the issue of scalable construction of symbolic automata 
for LTL formulas in the context of LTL satisfiability checking. We defined novel en- 
codings and novel BDD variable orders for accomplishing this task. We explored the 
impact of these encodings, comprised of combinations of normal forms, automaton 
forms, transition forms, and combined with variable orders. We showed that each can 
have a significant impact on performance. At the same time, we showed that no single 
encoding outperforms all others and showed that a multi-encoding approach yields the 
best result, consistently outperforming the native translation of CadenceSMV. 

We do not claim to have exhaustively covered the space of possible encodings 
of symbolic automata. Several papers on the automata-theoretic approach to LTL de- 
scribe approaches that could be turned into alternative encodings of symbolic automata, 
cf. [4, 18, 20, 37]. The advantage of the multi-encoding approach we introduced here is 
its extensibility, adding additional encodings is straightforward. The multi-encoding 
approach can also be combined with different back ends. In this paper we used Ca- 
denceSMV as a BDD-based back end; using another symbolic back end (cf. [14]) or 
a SAT-based back end (cf. [3]) would be an alternative approach, as both BDD-based 
and SAT-based back ends require symbolic automata. Since LTL serves as the basis for 
industrial languages such as PSL and SVA, the encoding techniques studied here may 
also serve as the basis for novel encodings of such languages, cf. [8, 9]. 

In this paper we examined our novel symbolic encodings of LTL in the context 
of satisfiability checking. An important difference between satisfiability checking and 
model checking is that in the former we expect to have to handle much larger formulas, 
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since we need to consider the conjunction of properties. Also, in model checking the 
size of the symbolic automata can be dwarfed by the size of the model under verifica- 
tion. Thus, the issue of symbolic encoding of automata in the context of model checking 
deserves a separate investigation. 
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